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Abstract. In a 2002 paper, Che and co-authors proposed a simple approach 
for estimating the hit rates of a cache operating the least recently used (LRU) 
replacement policy. The approximation proves remarkably accurate and is ap- 
plicable to quite general distributions of object popularity. This paper provides 
a mathematical explanation for the success of the approximation, notably in 
configurations where the intuitive arguments of Che et al. clearly do not ap- 
ply. The approximation is particularly useful in evaluating the performance of 
current proposals for an information centric network where other approaches 
fail due to the very large populations of cacheable objects to be taken into ac- 
count and to their complex popularity law, resulting from the mix of different 
content types and the filtering effect induced by the lower layers in a cache 
hierarchy. 



1. Introduction 

The investigation of so-called information-centric networking (ICN) architectures 
is bringing renewed interest in the performance of caching. It is particularly impor- 
tant to understand the potential for trading off bandwidth for memory by imple- 
menting a network of caches and to develop tools that enable the optimization of 
such a network. The ICN application places particularly stringent requirements on 
evaluation tools since the population of content items available via the Internet is 
immense and caches are required to store content of diverse types, each type being 
distinguished by its peculiar popularity characteristics. 

In recent work on cache performance in the context of ICN [7] , we applied a tool 
from the literature that was particularly well adapted to requirements. This is an 
approximation for evaluating the hit rates of a cache under the least recently used 
(LRU) replacement policy proposed by Che, Tung and Wang in a 2002 paper [3j. 
The "Che approximation" proved extremely accurate, even in conditions where the 
authors' intuitive arguments were clearly not justified. The objective of the present 
paper is to provide more rigorous mathematical arguments allowing the scope of 
the approximation to be more clearly defined. 

The Che approximation applies to the following model. Users request items 
from a population of N objects, first testing to see if the object is present in a 
cache of capacity C. If the object is present it is returned to the user. If not, it is 
obtained from some other source and copied to the cache as it is returned to the 
user. This object replaces the one that was least recently requested. The probability 
a request is for object n, for 1 < n < TV, is proportional to some popularity q{n), 
independently of all past requests. 
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The hit rate h{n) for object n, i.e., the probabihty this object is present in the 
cache, is approximated by 

h{n) « 1 - e-9(")*'^ 
where tc is the unique root of the equation 

N 
n=l 

Related work 

There is clearly a huge amount of related work on the performance of caching. 
We restrict ourselves here to a discussion of the papers most relevant to our work. 
Cited papers |3] and |12| may be consulted for summaries of significant early work. 

Dan and Towsley [4] derived an iterative algorithm for calculating approximate 
hit rates for a cache of size C using the hit rates for a cache of size C — 1. Com- 
plexity is 0{CN) which can be prohibitive in ICN applications where N and C are 
very large. Dan and Towsley also propose a scheme comparable in complexity to 
the Che approximation for computing hit rates under FIFO replacement (i.e., the 
object replaced is the one that has been in the cache the longest). Recent work by 
Rosenweig et al. applies the LRU algorithm of [4] to analyse general networks of 
caches [16]. 

Jelenkovic provides closed-form asymptotic hit rate estimates for particular choices 
for popularities q{n) [T2|. These are namely, a generic light-tailed law, q{n) = e^'*'" 
for n > 0, where A and j3 are positive constants, and a generalized Zipf law, 
q{n) = for n > 0, with a > 1. The latter law with a < 1 and a = 1 is 

considered by Jelenkovic et al. in where it is shown that hit rates can be ex- 
pressed in terms of a parameter defined as the root of a certain equation. The main 
disadvantage is that derived formulas are only applicable to particular popularity 
laws. 

In the following, we first discuss the considered traffic model in Section [2] before 
presenting the Che approximation in detail in Section[3] Reasons for its remarkable 
accuracy are elucidated in Section |4] while Section [5] derives explicit results for the 
special case of Zipf popularity. An approximation for random replacement similar 
to the Che approximation is derived in Section [6] and used in Section [7] in an ICN 
application. 

2. Traffic model 

We recall the independent reference model (IRM), discuss the nature of the pop- 
ularity law g(-) and argue that the IRM is appropriate for modelling an information- 
centric network. 

2.1. The independent reference model. Requests for objects occur in an infi- 
nite sequence where the object index required on the i*'' request, for i > 0, is an 
independent random variable on {1, 2, ... , N} with a common probability distribu- 
tion. Specifically, the probability the required object has index n is proportional to 
q{n)^ for 1 < n < A'^. We refer to g(-) as the popularity law. 

An alternative description of this stochastic is as follows (see Fill and Hoist [6] , for 
example). Let (r„) be a sequence of independent exponential random variables with 
respective rates {q{n)). At any arbitrary instant, the next object to be requested 
is no where no is the index such that Tng is the minimum of the (t„). 
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Figure 1. A filtered popularity law: q'{-) is the popularity law for 
requests overflowing a size 1000 cache when q(-) is Zipf(0.8) and 
TV = 10000. 



2.2. Popularity laws. Cache performance depends crucially on the popularity 
law q{-). It is usual to order objects in order of deceasing popularity such that 
(/(l) > 9(2) > . . • > With this convention, the most frequently observed 

popularity law is a generalized Zipf law: q(n) ~ 1/n" with a > 0. 

Examples of content types with reported Zipf law behaviour are web pages [I] , 
[13] , files shared using Bit Torrent [7] , YouTube documents [5] , [2 , video on demand 
movies |19j . The reason why content popularity follows the Zipf law remains un- 
clear though the discussion by Mitzenmacher on generative models for power laws 
provides some possible explanations [15] . The estimated value of a is often around 
0.8 though cases with a > 1 have been observed. 

In fact, agreement between observations and the Zipf law is not always entirely 
convincing. Sometimes the popularity law has a lighter tail where the least popular 
objects are very unlikely to be requested. As a simple example of a light-tailed 
law, we consider geometric popularity: q{n) = p". This choice is not based on 
any measurement results but is made rather with the intention of stressing the Che 
approximation: the approximation is in fact exact for a uniform probability law, 
q{n) =constant; it is more likely to fail as the law becomes more accentuated, as 
with the geometric law. 

While a Zipf law with suitable a might be a reasonable representation for a 
homogeneous set of content, caches in the Internet must be designed for a traffic 
mix. The popularity law should reflect this mix by weighting single type popularities 
by the proportion of requests due to that type. It is necessary also to account for 
significant differences in the size of objects of different types (see Section [32]) . 

In a network, there is typically a hierarchy where higher layer caches receive 
requests only for objects that are not found in lower layers. The popularity law is 
thus a filtered version of the initial law. Figure [1] shows how the popularity law at 
a second layer cache is deformed by the first layer. 
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Fortunately, the Che approximation is sufBciently versatile to account for both 
composite popularity laws and filtering [7]. Note that it is not necessary to order 
objects in decreasing order of popularity or to normalize the q{n) and we do not 
make these assumptions in the following analysis. 

2.3. Validity of the independent reference model. The independent refer- 
ence model is a convenient abstraction that allows analytical modelling where a 
more accurate representation of reality would be intractable. It is important to 
understand the limitations of this model. 

Adopting the IRM is to assume popularity does not change. This is clearly 
not true for content where, not only the popularity of a given object, but also the 
catalogue of available objects change over time. This is a manifestation of temporal 
locality. The IRM may still be considered acceptable if popularity variations are 
slow compared to the time scale of cache churn. Non-stationarity has a greater 
impact on the accuracy of measurements of popularity. For instance, statistics 
gathered over a period of weeks will hardly be representative of the popularity of a 
catch-up TV show that is only on-line for a few days. 

A second cause of error in predicting popularity laws is spatial locality. Most 
types of content will have strong regional bias with, for instance, a network in 
France observing high popularity for movies in French. However, this does not so 
much invalidate the IRM assumption as require more care in specifying popularity 
laws. 

The IRM is reasonable when content requests are generated independently by a 
large population of users. This is not the case, however, for requests seen by caches 
at second and higher layers in a hierarchy. The request process overflowing from 
lower layer caches is correlated. The IRM is nevertheless reasonable if a higher layer 
cache receives the aggregation of independent, low intensity overflows from many 
low layer cache instances. Moreover, even for the simple tandem of two caches 
considered by Jenekovic and Kang [13j , the impact of correlation on hit rates was 
shown to be slight. 

We conclude from the above that the IRM is a reasonable basis for evaluating 
cache performance, as long as care is taken in specifying the popularity law. 

3. The Che approximation 

We present the Che approximation and demonstrate its accuracy in comparison 
to the results of simulation. 

3.1. A characteristic time. Consider a cache with capacity for C objects under 
the LRU replacement policy. We introduce the following random variables, for 
t > 0, 

N 

and 

Tc{n) ^ M{t >0:X^{t) = C}. 

Xn{t) is the number of different objects requested up to time t, excluding object 
n, and Tc{n) is the time at which exactly C different objects, other than n, have 
been requested. 
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Without loss of generahty, suppose a request for object n occurs at time 0. The 
next request for object n wiU be a hit if fewer than C other objects are requested in 
(0,T„) (recaU that is the generic, exponentiaUy distributed inter-request interval 
for object i). In other words, there is a hit if X„(r„) < C. Now, 

{X„(t„) < C} = {Tcin) > T„} 

so that, 

h{n) = P(Tc(n) > T„) = E (l-e"*'")^^'")) . 
Since at Tcin) there are exactly C objects in the cache, we have 

N 

C= X! l{Ti<Tc(n)}, 

and taking expectations, 

N 



g-g(j)Tc(n) 



A first approximation of Che et al. [3] is to assume for large C that the Tc{n) are 
nearly deterministic. They replace random variable Tc{n) by the constant tc{n) 
that solves 

N 

C= J2 (l-e-^W*) 

and approximate the hit rates by 

h{n) = l-e-'?(")*^("). 

A second approximation in |3] is to assume tc{n) = tc ioT 1 < n < N where tc 
solves 



l<i<JV 

This is arguably reasonable when individual popularities q{n) are small relative to 
the sum X]n9(^)- Having verified numerically that this second approximation is 
generally very accurate, we adopt it for the remainder of the paper. This simplifies 
notation but it should be noted that our analysis could readily be adapted to 
preserve the dependence on n. 

In summary, the Che approximation considered in this paper is as follows. Let 
tc be the unique root of the equation 

N 

(1) = E " 

1=1 

The hit rate h{n) for object n, for 1 < n < N, is then 

(2) /i(n) = l-e-«(")*^. 
Che et al. refer to tc as the "characteristic time" of the cache. 
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Random variables X{t) and Tc are defined as above witliout excluding object n. 
In particular, X{t) = J^kkn ^{ri<t}- Since the Bernoulli events in the summation 
are independent, we readily derive the mean and variance of X(t): 



N 



(3) m{t) = ^(l-e-^C')*), 



1=1 

N 



(4) aity = ^e-'W*(l 



3.2. Variable sized objects. So far we have assumed the cache capacity is mea- 
sured in objects. In reality content objects have difFercnt sizes and cache capacity 
is more reasonably measured in bytes. Suppose object n has size 9{n). Since the 
cache is intended to store a very large number of objects, d{n) ^ C so that we can 
reasonably ignore boundary effects and adapt the Che approximation by replacing 
(P) with 

N 

(5) C = 5](l-e-^W*)0(z). 

i=l 

The hit rates are still given by 

An alternative way to account for variable size objects is to assume they are 
divided into constant sized chunks. This is the principle of content-oriented Internet 
architectures like CCN [TO]. Let 9{n) be given in chunks and assume all chunks 
inherit the popularity q{n) of their parent object. Applying the Che approximation 
to chunks, it is easy to see that equation ([T]) for tc is then precisely the same as 
©. 

For some types of objects, it may be that chunks of the same object have different 
popularities (e.g., the first chunks of a video will be viewed more often than the 
last chunks). It makes more sense in this case to directly postulate a popularity 
law for chunks rather than objects. 

We conclude that the Che approximation presented in Section [3T] is appropriate 
also for variable size objects. It is not necessary to complicate the analysis in the 
next sections by introducing the size 9{n) (although to use ([5]) might be useful in 
practice as in [7]). 

3.3. Accuracy. The accuracy of the approximation is typified by the results shown 
in Figure [2j Figures [2al and [2bl plot the hit rates of objects ranked 1, 10, 100 and 
1000 from a population of 10000, assuming Zipf popularity with a = 0.8 and 
a = 1.2, respectively. The crosses are the results of simulations with sufficiently 
long runs to ensure their high accuracy. The lines are derived from the Che ap- 
proximation. Agreement is perfect, for all practical purposes. 

Figure He] confirms the approach is accurate also for a case where the intuitive 
arguments of Che et al. do not apply. The population is only 100 and the popularity 
law is geometric with parameter p = 0.9. The figure plots the hit rates of objects 
ranked 1, 4, 16 and 64. Discrepancies are visible only for object 1 and these are 
very slight. 
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Figure 2. Hit rate against cache size for selected objects, LRU 
replacement 

4. Why the approximation works 

We explain why the Che approximation works so well, even for a small object 
population and a small cache size. 

4.1. A Gaussian approximation for X{t). Rather than attempting to study 
Tc directly, we consider X{t) and exploit the elementary relation f'{Tc > t) = 
¥{X{t) < C), ioT t > 0. Since the variable X{t) is a sum of independent random 
variables, it is natural to expect its distribution to be approximately Gaussian. 

Figure [3] plots the distribution of X{t) for some values of t when the popularity 
distribution q{n) is: Ea]) Zipf(.8) for 1 < n < 10*; [Sb]) Zipf(1.2) for 1 < n < 10"*; 
and [3c)) Geo (.9) for 1 < n < 100. The figures plot simulation results as crosses 
with a superposed normal distribution of mean and variance given by ([3| and Q , 
respectively. The figures clearly suggest that X{t) is indeed Gaussian for the range 
of popularity laws and times t of interest. 



4.2. A central limit theorem. The following proposition establishes conditions 
under which a Gaussian approximation for X{t) is reasonable. 
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Figure 3. Distribution of X{t), simulation and Gaussian approximation 
Proposition 1. If W{t):= {X{t) — m{t)) /a{t), where m{t) and ait) are given by 
^ and respectively, then 

\\C{W{t)) ~ C{g)\\:^ sup \P{W it) < x)-F{g < x)\ <-^, 

where Q is a centered normal random variable and K < 0.56. 
Proof. Let Zn{t) ~ [l[rn<t} ^ (1 ~ cxp(— (7(n)t))]/(T(t) and 

Then E(Z„(i)) 0, E{Z{tf) = 1 and we have, 

aitfE (|Z„(On =e-39(")*(i_e-''(")*)+(i_e-'?(")*)3e-«(")* 

Hence, 



a{t)- 
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Berry-Esseen's Inequality, see Feller [3 p. 544], gives the relation 

^ K 
\\£{W{t))-C{g)\\<KY,E{\Z,M')<^y 

where K is a constant. It has recently been shown that the value of K is no greater 
than 0.56 [17]. □ 

This proposition confirms that is asymptotically Gaussian as t — > oo if 

cr(i) grows unboundedly. It is not wholly satisfactory, however, in that it does not 
explain the excellent fit illustrated in Figure [3] for small values oft and for geometric 
popularity where cr(t) is not an increasing function. This appears to be a normal 
situation for central limit theorems where convergence to the normal distribution 
is often much better than predicted by the analytical bounds. 

4.3. Approximating the hit rates. Starting with the Gaussian approximation 
for the distribution of X(t), we argue that the Ghe approximation is indeed generally 
applicable. 

Proposition 2. Assuming X{t) is Gaussian of mean m,(t) and standard deviation 
<7{t), the hit rate for object n may be written, 

where erfc(a;) is the complementary error function. 

Proof RecaU that h{n) = 1 - E (e-«(")^'=). Gonsider E (e-«^c^ foj. gonie q > 0. 
By definition of Tc , 

E (e-'^G) = / p(r^ < u)qe-'^''du ^ / F{X{u) > C)qe-'^'' du 
Jo Jo 

By the assumption that X{t) is Gaussian, we can write 
P(XH>C) = ierfc^^-"^(") 



2 V V2a{u) 

and the proposition follows. □ 

Proposition [2] could be used directly to evaluate the hit rates. Instead, we derive 
the Ghe approximation as an approximation for the integral in ([6|). Note that 
the complementary error function in the integrand is an S-shaped function tending 
rapidly to asymptotes at and 1 from a point of inflection at m{u) = C, i.e., at 
u — tc- We therefore replace this function in the integral by the step function 
l{m(M)>c} yielding the following approximation 

r+oo 

Jo 



r+oc 

Jo 



The second step follows from m{tc) = C and the fact that m{u) is increasing in u. 
This establishes the validity of the Ghe approximation on condition that replacing 
erfc by a step function is accurate. 
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Figure 4. Approximating the integral 

This accuracy is illustrated in Figure|3]for a particular set of parameters: Zipf(0.8) 
popularity, N = 10000, C = 100 and g = .83 x 10"^. The exact integral is the 
shaded area. The approximation replaces this by the area to the right of tc and 
under the exponential curve. Visibly, the approximation is good in this case. It is 
easy to convince oneself that this is generally true for the popularity laws of inter- 
est. However, it unfortunately does not seem possible to quantify the error due to 
the non-explicit nature of the erfc function argument in Proposition [21 

5. ZiPF POPULARITY AND A LARGE CACHE 

For Zipf law popularity, we can prove the asymptotic validity of the Che approx- 
imation and characterize tc directly, without solving equation ([1]). 

5.1. Preliminary results. We first prove two lemmas about the moments of X{t). 
Lemma 1. For t > 0, the variance of X{t) can he expressed in terms of its mean: 

a{tf = m{2t) - m{t). 

Proof. From (g]), 

N N 
n— 1 n— 1 

which yields the desired identity. □ 

Now consider the behaviour of the average of {X{-)) at an appropriate time scale. 
Lemma 2. With q{n) = 1/n" for I < n < N , for any (3 > 0, 

E(X(/3iV")) =^.,(/3)iV + o(iV), 

where 

V'a(/3) = 1- / 6-'^/'=° dx. 
Jo 
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Proof. 

E{X{l3N")) = N X 6-'^^°/"°) =^yj 1 - e-''/^" dxj + o{N). 

□ 

5.2. A Gaussian approximation for Tc- In the particular case of Zipf law popu- 
larity we show that Tc is asymptotically Gaussian and derive tc as its expectation. 
In the following [zj denotes the integer part of z > 0. 

Proposition 3. For Zipf (a) popularity, as N and C tend to infinity with C = 
[SN\ , for Q < 5 < \, we have 

(7) tLiwj =^-i(J)7V"+o(7V"), 

where ipaiP) is defined in Lemma\^ 
Furthermore, the random variable 

converges in distribution to a centred Gaussian random variable. 

Proof. The asymptotic relation for tc is a direct consequence of Lemma [5] We 
have, 

E{X{(3N'')) = MP)N + o{N) 

and, by definition of tc, 

E{X{tc)) ^ \5N\ =5N + o{N). 

Let sc = il^a^{5)N°' so that 

E{X{sc)) = ^o.{^^\5))N + o{N) = 5N + o(iV). 

It follows that tc = Sc + o{N°') and the first statement of the proposition is proved. 
Now consider the following relation, for x S M, 

P(rc -tc>x)^ nx{tc + x) <C) = 

p X{tc+x)~m{tc+x) ^ m{tc)-m(tc+x) 
[ cr{tc+x) ^m(^2{tc+x))-m{tc+x) 

The second equality follows on setting mite) = C and applying Lemma [TJ 

Since for the Zipf laws, a{(3N"-) goes to infinity as oo, Proposition [T] shows 
that the variable 

X(^A^") - m(/3iV") 
cr(/3iV") 

is arbitrarily close to a centred normal variable if N is sufficiently large. The only 
thing to check is the asymptotic behaviour of the fraction 

m{tc) - m{tc + x) 
^m{2{tc + x)) - m{tc+x) 

when X = PN'^'^/^. 
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Using Lemma [21 write m{tc + x) as 

m{tc+x) = ,^(iV"(Ci(5) + ^+o(l))) 

Expanding ip^ about ip^^{S) yields the numerator 

m{tc) - m{tc + x) = -i,'^{i'-^){5)N^—x + o{N). 
The denominator can similarly be written 

Vm(2i, + x)) - m{tc + x) ^ \/Va(2i^a '(<5))iV - 6N + o{N). 
The second statement of the proposition follows on substituting for x. 

□ 

Remark. The expression for t[5Arj in Proposition |3] coincides with an equivalent 
quantity derived differently in Theorems 2 and 3 of Jelenkovic et al. for a = 
1 and a < 1, respectively. Theorem 1 of the same paper provides an explicit 
expression for tc when a > 1 and N is infinite. This expression proves significantly 
less precise than ([7]) when a is not much greater than 1 (1.2, say), even for N as 
large as 10000. 

5.3. The Che approximation. Proposition|3]shows that there is a function 8(iV) 
such that, as — > oo, Q{N)/tc — !• while {Tc — tc)/'d{N) converges to a Gaussian 
random variable. Thus in this case, Tc does indeed become deterministic (i.e., 
Tc/tc ^ 1) and the original argument of Che et al. applies. We have E{e~'''^'^) — > 
e-«*^ as C (and N) oo. 

5.4. Geometric popularity. It can be shown for geometric popularity, q{n) = 
p" for n > 0, that m{t) = -\ogt/logp + 0{1). Thus, by Lemma [H a{t)'^ ^ 
log2/ logp + 0(1), i.e., the variance of X{t) is asymptotically constant and small 
compared to m{t). This explains why the Che approximation works for geometric 
popularity (applying the arguments in Section |4]) although Tc is by no means 
deterministic. 

6. A "Che approximation" for random replacement 

The excellent accuracy of the Che approximation for LRU replacement motivates 
the search for a similar approach for other policies. In this section we consider 
random replacement and derive an approximation that is similar in accuracy and 
complexity to the Che approximation. 

Random might be preferred to LRU because it is simpler to implement. When 
a new object is to be added to the cache, it overwrites a randomly chosen existing 
object independently of the popularity of the latter. This policy was shown by 
Gelenbe to have exactly the same hit rates as FIFO |8]. 

The exact analysis of Gelenbe is too complex for practical evaluation. A recent 
paper by Simonian et al. |18| provides large cache asymptotics applicable for a Zipf 
popularity law with a > 1. Dan and Towsley [4] propose an approximate evaluation 
for the hit rates of a FIFO cache. 

Note that h{n), the hit rate for object n, is the probability object n is in the 
cache at an arbitrary instant. This can be expressed by Little's formula as the 
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("i) 


Web 


.18 


lO^i 


10 


0.8 


File sharing 


.36 


10^ 


10*^ 


0.8 


UGC 


.23 


108 


10^ 


0.8 


VoD 


.23 


10^ 


10^ 


1.2 



Table 1. Internet content traffic characteristics 



product A(n) x T(n) where X(n) is the frequency at which object n enters the cache 
and T{n) is its average sojourn time. 

Clearly, A(n) = (1 — h{n))q{n). We assume T{n) is inversely proportional to the 
arrival rate of requests for objects other than n. This is only approximately true 
but is intuitively reasonable. We deduce, h[n) = (1 — h[n))q{n) x Tcj^i^n 

q{n)Tc 



(8) h{n) 



for some unknown constant tc- Equating the sum of hit rates to the cache size C, 
as in Section [31 yields the equation for rp. 



(9) C 



N , . 



Equation © is the 'random' equivalent to the LRU Che identity ([T]). Solving for 
To yields the hit rates via 

Figure [5] shows results analogous to those of Figure [2] for LRU. The accuracy 
is clearly comparable. It largely remains to analyse why this is so but note that 
the assumption sojourn times are proportional to the request rate of other objects 
appears equally reasonable for all realistic popularity laws. The FIFO algorithm of 
Dan and Towsley [4] is similarly accurate. 

7. Application 

In this section we present an application that is intended to illustrate the power 
of the Che approximation. We revisit the networking example introduced in [7] 
where users retrieve a mixture of web, file sharing, user-generated content (UGC) 
and video-on-demand (VoD) content via a cache. 

Traffic characteristics and assumed popularity laws are presented in Table [T] 
Note the very large populations and diverse popularity laws. These make other 
performance evaluation approaches, including simulation, impractical. 

We suppose objects are divided into 1 KB chunks. For the sake of simplicity, 
objects of the same type i are supposed to have constant size 9i chunks. We assume 
objects have Zipf popularity with exponent and chunks inherit the popularity of 
their parent object. The proportion of type i traffic in bit/s downloaded by users 
is Pi. 

Given these assumptions we deduce the popularity of chunk k of object n of type 
i, for 1 < I < 4, 1 < n < iVi and 1 < fc < 6*^, 
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Figure 5. Hit rate against cache size for selected objects, random 
replacement 



In applying the Che approximation we optimize summations over as many as 
10^""^ objects in ([T|) and (0) by grouping successive terms which are nearly equal. 
Computation is then very rapid. 

Figure [6] compares the overall hit rate for the traffic mix as a function of cache 
size for three different replacement policies: least frequently used (LFU), LRU and 
random. The LFU hit rate is calculated as in [7] while for LRU and random we use 
the Che approximations of Sections |3] and [51 respectively. 

The significance of these and similar results is discussed in [7]- An additional 
observation is that random is hardly worse than LRU in this case. Our main 
objective in presenting these results is to stress that they are readily derived using 
the Che approximation when, in view of the huge populations and diversity of 
content objects, any other approach would be impracticable or inexact. 
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Figure 6. Hit rate against cache size (in bytes) for the traffic mix 
of Table [T] with LFU, LRU and random replacement 

8. Conclusion 

The Che approximation constitutes a versatile and highly accurate tool for pre- 
dicting the hit rate performance of a cache with LRU replacement. We have demon- 
strated in the paper why the approximation works so well, even when the conditions 
suggested by its authors are not satisfied. The analysis lends confidence to using 
this tool to evaluate the performance of an information-centric network where the 
large populations and diversity of content catalogues preclude utilization of alter- 
native approaches. Note, in particular, that the Che approximation can be usefully 
combined with the approach in [16] to evaluate large-scale, general cache networks. 
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